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Abstract 

This paper is concerned with the problem of recovering a finite, deterministic time 

series from observations that are corrupted by additive, independent noise. A distinctive 

feature of this problem is that the available data exhibit long-range dependence and, as 

a consequence, existing statistical theory and methods are not readily applicable. This 

paper gives an analysis of the denoising problem that extends recent work of Lalley, 

but begins from first principles. Both positive and negative results are established. The 

positive results show that denoising is possible under somewhat restrictive conditions 

J^ ' on the additive noise. The negative results show that, under more general conditions 

VO ' on the noise, no procedure can recover the underlying deterministic series. 
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Recent interest in chaos has drawn the attention of statisticians to deterministic phenomena 

that exhibit random behavior. While there is no universally accepted definition of chaos, 

phenomena termed "chaotic" have generally been studied in the context of dynamical sys- 

^ • tems, which provide mathematical models of physical systems that evolve deterministically 



C^ 



in time. (Good introductions to dynamical systems and chaos for non-specialists can be 
found in the texts of Devaney J15j and Alligood et al. jS].) In what follows we will consider 
a standard model for dynamical systems, in which the relevant states of the system form 
a compact subset A of M'^- The time evolution of the system is described by an invertible 
map F : A ^ A. If at time i the system is in state x € A, then at time i + 1 it is in state 
Fx, and at time i — 1 it is in state F~^x. That descriptions of this sort are, in a precise 
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sense, generic follows from Takens's embedding theorem j34l HI 05] . We do not assume that 
F (or F^^) is continuous. Starting from an initial state x G A at time zero, the complete 
time evolution of the system is described by the bi-infinite trajectory 

state . . . F~'^x F~^x x Fx F'^x . . . 
time ... -2 -1 1 2 

Here -F* is the z-fold composition of F with itself and F~* is the i-fold composition of 
F^^. This model is deterministic: from exact knowledge of the state of the system at any 
point in time, one may reconstruct all the past and future states of the system by repeated 
application of F and F~^. To simplify notation in what follows, let Xi = F'^x, i G Z, so 
that the initial state x of the system is denoted by xq. 

To date, most statistical analyses of dynamical systems have been carried out in the 
context of dynamical noise models. In a dynamical noise model, the available observations 
are assumed to be generated according to a nonlinear autoregressive scheme of the form 
Xi^i = Fxi + rji, where {ryj} are independent, mean zero random vectors. In this model, 
random noise is "folded" into the dynamics at each step, and the resulting sequence of states 
Xi is not purely deterministic. In the presence of dynamical noise, the observed states form 
a discrete time, continuous state Markov Chain, and estimating interesting features of the 
dynamics (e.g. the map F) can often be accomplished in part by an appeal to traditional 
time series techniques. Representative work can be found in references |371 1191 1111 1121 1231 
1311126105] . An alternative approach to the map estimation problem is described in |3Uj . 

Of interest here is the so-called observational noise model, in which the available data are 
observations (or measurements) of an underlying deterministic system that are corrupted 
by additive noise. In this model our observations take the form j/j = Xj + e,, where {ej} are 
independent, mean zero random vectors. In contrast with the dynamical noise model, the 
noise does not interact with the dynamics: the deterministic character of the system, and 
its long range dependence, are preserved beneath the noise. Due in part to this dependence, 
estimation in the observational noise model has not been broadly addressed by statisticians, 
though the model captures important features of many experimental situations. Here we 
are interested in the problem of how to recover the underlying time series {xi} from the 
observations {yi}- 

Denoising problem: Reconstruct the successive states xo,...,x„ of the deterministic 
system from observations of the form 

yi = Xi + ei = FVo + Si < i < n (1) 



where Eq, ■ ■ ■ ,£n ^M. are independent random vectors with mean zero. 

Several versions of the denoising problem, and associated methods, have previously been 
considered by a number of authors, including Kostelich and Yorke .23', Davies J14j . Sauer 
j32j . Kostelich and Schreiber j^. The methods and results described here are motivated 
by recent work of Lalley j241 I25j . MacEachern and Berliner ^f\ studied the problem of dis- 
tinguishing trajectories in the observational noise model when the noise distribution comes 
from a suitable exponential family and established the asymptotic normality of relevant 
likelihood ratios. 

Though some features of denoising can be found in more traditional statistical problems 
such as errors in variables regression, deconvolution, and measurement error modeling (c.f. 
jlOj). other features distinguish it from these problems and require new methods of analysis. 
For example, in the denoising problem the covariates Xi are deterministically related (not 
i.i.d. or mixing), the noise £i is often bounded (not Gaussian), and the noise distribution is 
usually unknown. 

In the denoising problem the underlying states of the observed deterministic system 
are of primary interest. Denoising methods can also provide useful preprocessing for other 
statistical analyses. In the absence of noise, and under appropriate regularity conditions, 
xo,xi,... can be used to estimate the map F (HIHlEni, its invariant measure, entropy, 
and Lyapunov exponents !T8^, or the fractal dimension of its attractor (see |13)). When 
observational noise is present, consistent reconstructions XQ,...,Xn can sometimes act as 
surrogates for the unobserved states in estimation problems of this sort. The surveys |17| 
l6| I2fl| I2T] give an account of statistical problems in the study of dynamical systems. Formal 
limits to statistical inference from dependent processes can be found in j2[EI2n]- ^From 
the viewpoint of statistical practice and theory, it is interesting to ask whether estimation 
is still possible when noise removal is not, but we will not address such issues here. 

2 Summary 

The next section contains several preliminary definitions and results that will be used 
throughout the paper. Section \^ describes two denoising procedures. The consistency 
of these procedures is established in Theorems Handle under a boundedness assumption on 
the noise. It is shown in Section [5] that, in a variety of settings, consistent denoising is not 
possible when this assumption is significantly relaxed. Proofs of the positive (consistency) 
results are given in Section ^ proofs of the negative results are given in Section [3 



3 Preliminaries 

Throughout this paper we assume that F : A ^ A is an invertible map of a compact 
set A C ig'^. Of primary interest are maps that possess an elementary form of sensitive 
dependence on initial conditions. Recall that F is said to be expansive if there exists A > 
such that for every pair of vectors x,x' E A with x ^ x', 

sup\F^x - F'^x'l > A. 

The constant A is called a separation threshold for F. If F is expansive then, beginning 
from any two distinct initial states, the corresponding bi-infinite trajectories of F will, at 
some (possibly negative) time i be at least A apart. Note that the separation threshold A 
does not depend on x or x'. 

Definition: Let F be an expansive map with separation threshold A > 0. The separation 
time for x 7^ x' is 

s{x, x) = min{|s| : \F'^x - F'^x'\ > A}. 

For each a > define the separation horizon 

H{a) = sup{s(x,x') : jx — x'|>a}. 

Note that a < a' implies H{a) > H{a'). If H{a) < 00 for every a > 0, then then F will 
be said to have finite separation horizon. 

Proposition 1 If F has finite separation horizon then the inverse function 

H-^{k) = inf{a > : H{a) < k} (2) 

tends monotonically to zero as k ^> 00. 

Proof: The monotonicity of H^^ follows from that of H. If H^^ik) > ap > for every k, 
then H{a) = +00 for a < uq. 

If i^ : A — > A is invertible and continuous, then F^'^ is continuous and F is a homeomor- 
phism (see, e.g. [35]). An elementary argument shows that every expansive homeomorphism 
has finite separation horizon. 

Lemma 1 If F : A —^ A is an expansive homeomorphism, then F has finite separation 
horizon. 



Proof: Let A > be a separation threshold for F. If H{a) = +cxd for some a > then 
there exist pairs of states (x„,xj^) G A x A, n > 1, such that |x„ — x^| > a for each n and 
s{xn,x'n) ^ oo. As A is compact, there exist integers rii < n2 < • ■ ■ and points x,x' € A 
such that Xn,. — > x and x^^ — > x'. Clearly |x — x'| > a. Moreover, as F is continuous and 
s(xnj.,x^^) — > oo, for each tti > 1, 

max iF'^x — F^x I = lim max |-F^x„, — F'^x„ I < A. 

|s|<m fe^oo |s|<m 

It follows that H{a) > s{x,x') = oo, which is a contradiction. 

3.1 Ergodic Transformations 

Ergodic Transformation: Let ;U be a probability measure on the Borel subsets of A. A 
map F : A ^ A is said to preserve fi if ^{F^^B) = fJ,{B) for each Borel set i? C A. A 
/i-preserving map F is said to be ergodic if F^^B = B implies ij-{B) E {0, 1}, i.e. every 
F-invariant set has /u-measure zero or one. 

The ergodic theorem generalizes the ordinary law of large numbers and is an important 
tool in understanding the asymptotic behavior of dynamical systems. It states that the 
time average of a real-valued measurement along the trajectory of an ergodic map F will 
converge to the space average of that measurement. 

Theorem A (Ergodic Theorem) If F : A ^ A is jjL-preserving and ergodic, and / : A — > 
M is such that f \f\ dfi < oo, then n~^ Y17=o fi^^^) ~^ f f dfJ- with probability one and in 
mean. 

4 Consistent Denoising 

In this section we describe two consistent denoising methods for deterministic time series, 
and provide a preliminary analysis of their theoretical performance. 

4.1 Smoothing Algorithm D 

We first describe a denoising method originally proposed by Lalley |11], called Smoothing 
Algorithm D. Let the available data be a sequence of vectors yo, . . . ,yn defined as in (0), 
and let A; be a positive integer less than log n. For each I = k, . . . ,n — k define the index set 

An{l,k) = {j : lyj+r-yi+rl < 3A/5 for \r\ <k}. (3) 



Note that / € An{l, k) so that An{l, k) is always non-empty. For I = k, . . . ,n — k define the 
denoising estimate 

' ""^ ' ^' jeA„(/,fc) 
of xi: set x/^„ = for other values of L To see how the estimate is constructed, let 
w{j,k) = {yj-k, ■ ■ ■ ,yj+k) contain the observations in a window of length 2k + 1 centered 
at yj. The estimate xi^n of xi is obtained by averaging all those values yj for which w{j, k) 
is close, on a term by term basis, to ui^l, k). 

Theorem 1 Let F he an expansive map with separation threshold A > and finite sep- 
aration horizon. Suppose that jejj < A/5 for each i > 0. If k ^ oo and k/logn — > 
then 



n—k 

n-2k 



^ n—K 

7^ ^ l^j.n - 2;i| -^ as n ^ oo 



=k 
with probability one for every initial vector x £ A. 

The in-probablity consistency of Smoothing Algorithm D was first established in The- 
orem 1 of ^2 under the condition that F is a C^-diffeomorphism and A is a hyperbolic 
attractor (or the basin of attraction of such a set). A more general result for expansive 
homeomorphisms is stated in Theorem 2 of [251 ■ Here these conditions are replaced by the 
weaker assumption of finite separation horizon, and in-probability convergence is strength- 
ened to convergence with probability one. The proof of Theorem ^ is given in Section 

lEl 

4.2 Implementation 

A naive implementation of smoothing algorithm D has running time 0{n'^), where n denotes 
the number of available observations. More efficient, approximate, versions of the algorithm 
with running time O(nlogn) are investigated in j25j . In simulations. Algorithm D and its 
approximations have been used to successfully remove noise from trajectories of the logistic 
map, the Henon attractor, and Smale's solenoid. Informal studies have illustrated the failure 
of the algorithm to remove uniform noise whose support is comparable to the diameter of 
the associated attractor. These simulations lend empirical support to Theorem ^ and the 
negative results discussed below. 



4.3 Preliminary Analysis 

Smoothing Algorithm D removes observation noise from the trajectory of an expansive map 
by judicious averaging. To understand why Theorem ^ holds, fix I between k and n — k. 
Together (QJ and @ imply that 

\ _ - \ ^ ^jeA„{i,k) \^i ~^j\ I 'l2jeA„(i,k) ^3 I f^. 

The first term on the right hand side of © controls the bias of the estimate x/, and the 
second controls its stochastic variation. Regarding the bias, note that 

j € An{l,k) =^ \yj+r - yi+r\ < 3A/5 for 1 < |r| < fe 
=^ \xj-^.r — xi+r\ < A for 1 < |r| < A: 
^ A; < H{\xi-Xj\) => \xi-Xj\ < H'^{k). 

Thus © implies that 

|x,-x,„| <H {k)+ ,^^(^^^^1 • (6) 

This yields the following bound on the average denoising error: 

^ n—k ^ n—k I v^ ^ I 

' ^%- x,.\ < H-\k) + -^ V ^^:T:'^' - (7) 



n-2kf^' ' ''"' - ^' n-2kf-f \An{l,k)\ 

l=k l=k 

The upper bound H^^{k) on the average bias depends on the map F and the window width 
fc, but is independent of n and /. Moreover, H~^{k) ^ by Proposition^ as F has finite 
separation horizon and A; — > oo. Analysis of the stochastic variation is complicated by the 
fact that the e^ are not independent when summed over the random index set A„(/, k). The 
details are given in the appendix (see in particular inequality (|24|) and Lemma EJ- 

The analysis above suggests a more adaptive version of Smoothing Algorithm D that 
offers improved performance under somewhat stronger conditions. Fix / for the moment 
and consider inequality ©• It can be seen that the window width k plays a role analogous 
to inverse bandwidth in kernel type estimators. Monotonicity of H~^ ensures that the bias 
of xi^n decreases as k increases. On the other hand, as k increases, the index set An{l,k) 
gets smaller, and the variability of the estimate will increase as one averages over fewer noise 
variables Sj. One modification of Smoothing Algorithm D, analogous to local bandwidth 
selection, is to adaptively select a window width for each location I. This is considered in 
more detail below. 



4.4 Denoising with a Variable Length Window 

Here new denoising estimates xi^n are described. Let the index sets An{l,k) be defined as 
in Q. The new estimates are based on windows whose widths are chosen adaptively to 
ensure that \An{l, k)\ is sufficiently large. For / = logn, . . . , n — logn define 

ki^n = max{l <k< logn : |^„(/,/i;)| > n/logn}, (8) 

and set ki^n = if \An{l, 1)| < n/logn. For the same values of /, define denoising estimates 

\An(J', kl^ri)\ 

Set xi^n = if ki^ri = 0. Strong consistency of the estimates x/^„ requires that the trajectory 
under study exhibit a natural recurrence property. 

Definition: A point x € A with trajectory Xi = F^x will be called strongly recurrent if 
there is a finite cover O of A such that (i) every O G O has diameter less than A/5, and 
(ii) for each r > 1 and each choice of sets Oi, . . . ,0r G O either 

oo 

^/{Xj G Oi,. . . ,Xi+r'-l G Or} < OO (10) 

i=0 

or 

^ n— 1 

liminf-y^/{xi G Oi,... ,Xi+r~i e Or} > 0. (11) 

j=0 

Conditions UlUj) and Ullj) ensure that if the forward trajectory of F starting from x visits 
the product set Oi x • • • x O^ infinitely often, then it does so a non-negligible fraction of 
the time. 

Recall that F is said to preserve a probability measure [i on the Borel subsets of A if 
fj.{F^^B) = fJ-{B) for each Borel set S C A, and that /i-preserving map F is said to be 
ergodic if F~^B = B implies /u(i?) G {0, 1}, i.e. every F-invariant set has /x-measure zero 
or one. Strongly recurrent points are the norm in measure preserving systems. 

Proposition 2 If F preserves a measure v on A and is ergodic then v-almost every x G A 
is strongly recurrent. 

Proof: Let O be any finite open cover of A by sets having diameter less than A/5. Fix sets 
Oi, . . . , Or G O. Note that xi G Oi, . . . , Xj+r-i G Or if and only if Xj = F*x G O' where 
O' = n'J^j^F^-'+^Oj. If viP') > 0, the ergodic theorem ensures that 

^ n— 1 ^ n—1 

lim - y /{x, G Oi, . . . , Xi+r^i e Or} = lim - y I{F'x G O'} = u{0') > 

i=0 j=0 



with i/-probability one. On the other hand, if i'{0') = then Xli^i ^{^ ^O') = and 
consequently v{F'^x G O' infinitely often} = by the first Borel Cantelli lemma. 

Theorem 2 Let F he an expansive map with separation threshold A > and finite sepa- 
ration horizon. If \ei\ < A/5 for each i >0, then for every strongly recurrent initial vector 
X £ X, 

max{ \xi^n — xi\ : log n < I < n — log n } — > 

with probability one as n tends to infinity. 

Performance bounds of this sort for Smoothing Algorithm D are established in ^1] under 
the stronger assumption that F is a C^-diffeomorphism and that A is an Axiom A basic 
set. 

5 Negative Results 

One distinctive (and restrictive) feature of Theorems ^ and |21 is the assumption that the 
noise ej is bounded in absolute value by a fraction of the separation threshold A. In light 
of the popularity and widespread study of Gaussian noise, it is natural to ask if denoising 
is possible when the £i are normally distributed, perhaps under some constraints on the 
component- wise variances. Surprisingly, the answer is often "no". Lalley ^| shows that 
for many smooth dynamical systems no scheme can successfully remove Gaussian noise, even 
in the weak sense of Theorem ^ In this section we extend and generalize this result. Our 
proof covers the Gaussian case, generalizations of the Gaussian case to noise distributions 
supported on all of M'^ (stated in |2^), and the case of noise distributions with bounded 
support. 

Suppose, as in the previous section, that {xj = F^x : z S Z} is the trajectory of a fixed 
initial vector x G A, and that observations of Xi are subject to additive noise, and can be 
modeled as random vectors 

yi = Xi + Ei iez (12) 

where . . . ,e-i, Eq, ei, . . . G M are independent, mean-zero random vectors having a com- 
mon distribution r] on M . We assume in what follows that the Si are defined on a common 
underlying probability space (il, J^, P). Of interest here are several related problems, which 
may be informally expressed as follows. 

Problem 1: Identify the initial state x G A from observation of the infinite 
sequence {yi : i G Z}- 



Problem 2: Consistently identify the initial state x € A from observations 

y-n, • • • , Vm in the limit as n ^ oo. 

Problem 3: Estimate the states xi, . . . , x„ G A from observation of yi, . . . , y„. 

It is evident that Problem 1 is easier than Problem 2, as in the former we have access to all 
the available data at the outset. It is also clear that an answer to Problem 2 might be used, 
in conjunction with shifts of the observations, to answer Problem 3. Problem 3 is just the 
denoising problem considered in the previous section. 

It is shown in Theorem |31 below that for distinguished states x and noise distributions t], 
neither Problem 1 nor Problem 2 has a solution. This negative result is then used to establish 
Theorem ^ which states that, for suitable dynamical maps F and noise distributions t], 
consistent denoising is impossible. 

5.1 Distributional Assumptons 

The negative results in Theorems |31 and |1] require that the distribution ry of e^'s be smooth 
and has sufficiently large support. Here we give a precise statement of these conditions. 

Suppose first that rj is absolutely continuous, having a density / with respect to d- 
dimensional Lebesgue measure A. Recall that if ^ is a Borel subset of M'^, u € M'^ is any 
vector and r > 0, then 

A + u = {v + u : V & A} and A^ = {u : \u — v\ < r ioi some v € A} 

are also Borel subsets of R'^. For f G R*^ and r > 0, let B{v,r) = {u : \u — v\ < r} be 
the Euclidean ball of radius r centered at v. Let S = {v : f{v) > 0} be the support of 
the density / of r/. Let S and S° denote the closure and interior of S, respectively, and let 
dS = S \ S° he its boundary. Finally, let p = max{[u — v\ : u,v a A} he the diameter of A. 
Note that p is finite as A is compact. We make the following assumptions concerning r]: 



lim sup -j— 



fiw + z) 
log 



fH 



f{w)dw < oo, (13) 



lira sup -r]{{dSY) < oo, and (14) 



SD 5(0,3/9/2). (15) 
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Assumption ()13() states that log / is Lipschitz continuous on the average. Assumption ()14() 
says that the measure of those points within distance r of dS decreases at least linearly 
with r. Assumption (|15() states that S contains a sphere whose radius is significantly larger 
than the diameter p of A. It is enough that assumptions ()14() and (|15|) hold for some version 
/ of dr]/dX. Note that ((TH) and ^^ are trivially satisfied if 5 = R'^. 

Example 1: If r/ is multivariate Gaussian and has a covariance matrix of full rank, then 
assumptions 1)14(1 and (|15|) are immediate, and one may readily verify that assumption ()13() 
holds. 

Example 2: Suppose that 77 has a density / with compact support S satisfying ((T3|l . and 
suppose further that / is Lipschitz continuous on S. Then / is bounded away from zero and 
infinity on S and one may verify that ()13() holds. Satisfaction of (|14|1 requires, in addition, 
that the boundary of S be regular. To quantify this, let N(dS, r) denote the least number 
of Euclidean balls of radius r > needed to cover dS. If N(dS, r) < c (l/r)*^"^ for some 
c < 00 and each < r < ro, then 

7]{{dSY) < c' sup\f{x)\N{dS,r)-r'^ < c'c sup |/(x)j • r 
xes xes 

for a suitable normalizing constant c', and (|14|) follows. The bound N{dS,r) < c(l/r)'^~^ 
implies, in particular, that the box counting dimension of dS is d — 1. Assumption ()14j) 
is satisfied, for example, if r] is the uniform distribution on B{0,3p/2), or the uniform 
distribution on a cube of side length 3p/2 centered at the origin. 

5.2 Homoclinic Pairs 

Let X and x' be distinct initial states in A, with corresponding trajectories {xj = F^x : i G Z} 
and {x'- = F^x' : i E Z}. Suppose that we wish to distinguish x and x' on the basis of their 
trajectories. In the absence of noise, and with knowledge of F, this is always possible: from 
observation of any Xi one can recover x, and from observation of any x'- one can recover 
x' . However, when observation noise is present, this simple inversion process is no longer 
applicable. Recall that y-i = Xi + £«, i E Z, are noisy observations of the trajectory of x. Let 

y[ = x'i + Ei iez (16) 

be observations of the trajectory of x', corrupted by the same additive noise sequence. 
Define X to be the set of all bi-infinite sequences v = . . . , v-i,vo, vi, . . . with Vi E M'^, and 

11 



let <S be the product sigma field for X generated by the finite dimensional Borel cylinder 
sets. For fixed x, x' the sequences 

y = (•••,y-i,yo,yi,---) and y' = (...,y'_i, 2/0,2/1,...) 

are random elements of {X^ 5), defined on the underlying probability space (il, F, P). Con- 
sider the following special case of Problem 1 above. 

Question 1: Is there a measurable function : ^ — > M such that 4>{y) = x 
and (/>(y') = x' with probability one? 

Intuitively, it will be more difficult to identify x and x' in the presence of noise if their 
trajectories stay close to each other across time. The notion of a strongly homoclinic pair 
is one way of making this precise. 

Definition: A pair (x, x') of distinct states in A is said to be strongly homoclinic for F if 
their bi-infinite trajectories are such that 

^\F'x-F'x'\ < oo (17) 

As noted in |21], homoclinic pairs exist and are common in many smooth dynamical 
systems. It is worth noting that the existence of a separation threshold does not preclude 
the existence of homoclinic pairs, as the separation of F^x and F'^x' need only occur for one 
value of i. Theorem |7| below shows that the answer to Question 1 is "no" when x and x' 
are strongly homoclinic. The proof is given in Section [7| 

Theorem 3 Suppose that the distribution rj of Si satisfies conditions ilcl \) - U5\) . If x and 
x' are strongly homoclinic, then for every measurable function </> : Af — > R*^, 

E[\(i)iy)-x\ + \^iy')-x'\] > 0. 

Remark: Among the functions (j) included in the theorem are those that incorporate knowl- 
edge of the dynamical map and the two possible initial states. Thus even with knowledge 
of {x, x'} and F, and even with access to the entire noisy trajectory, one cannot recover the 
initial state of the system with certainty. 

5.3 Negative Results for Denoising 

Suppose now that F : A ^ A preserves a Borel measure /Li on A and is ergodic. Let AT ~ /x 
be independent of {si} and define 

Xi = F'X, Y, = Xi + e^ iez (18) 

12 



where the e^ are i.i.d. with distribution r/. Then {(Xi,Yi) : i S Z} is a stationary ergodic 
process taking values in M*^ x M'^. Our principal negative result applies to dynamical systems 
that admit a homoclinic coupling. 

Definition: A /x preserving transformation F : A ^ A admits a homoclinic coupling if on 
some probability space one may define random vectors X and X' such that 

1. X and X' take values in A 

2. X and X' have distribution /x 

3. {X, X') is strongly homoclinic for F with positive probability. 

For systems admitting a homoclinic coupling, strongly homoclinic pairs are relatively 
common. When a homoclinic coupling exists we may ensure, by means of a standard product 
construction, that the pair (X, X') is defined on the same probability space as, and is inde- 
pendent of, the noise variables £« . It is shown in |21] that many common models of smooth 
dynamical systems, for example uniformly hyperbolic (and Axiom A) C^-diffeomorphisms, 
admit homoclinic couplings. 

Definition: A denoising procedure is a collection of measurable maps 'ipn,i '■ (Mr)"' — > M , 
with n > 1, and i = l,...,n. The procedure {'4>n,i} is weakly consistent for a process 



E 
as n tends to infinity. 



1 



n 







Theorem 4 Suppose that F : A ^ A is a ^-preserving ergodic transformation that admits 
a homoclinic coupling [X,X'). If the distribution r] of Si satisfies conditions U!^) - m5\) 
then no denoising procedure is weakly consistent for the process {{Xi,Yi)} defined in ilH\) . 

Proof: Assume, without loss of generality, that X is the first component of a homoclinic 
couphng {X, X') for F. Let X'- = F^X' and ¥( = X'- + Si for i G Z- Fix a denoising scheme 
{V'ra.i} and assume by way of contradiction that 



E 



1 " 

-V|^„,i(yi,...,y„)-x, 



n 

4 = 1 



0. (19) 
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The joint distribution of {{Xi,Yi)} is the same as that of {{X-,Y^)} and, therefore (jTO)) 
imphes that 



E 



1 

-J]iv„,(yi',...,i;:)-x;i 



4=1 



0. 



(20) 



For each n > 1 define 



1 " 

4>n{V-n,---,Vn) = - V] V'n.iC^l-i, • • • , ■"»!- 



i=l 



The stationarity of {{Xi,Yi)} imphes that 



£;!(/<„ (y_„,...,y„) -x 



j) • • • ) -'n— ij 



n ^-^ 

i=\ 

1 " 

i=\ 

1 " 

= -V^IVn,^^,...,!;) -X,|, 



i=l 



which tends to zero by (|19() . An analogous argument using 1)20(1 shows that 



^|(/<„(r „,..., y^) -X' 



0. 



If /7 is the event that (X, X') is strongly homoclinic for F then, letting v\ = Vi, . . . ,Vj, 

K{Y!^J -X\ + \UY-l) - X'\ 

\ \UYX) -X\ + \Uy-l) -X'\)- Ih . 



= lim £; 

n— >oo 



> lim inf E 

n— >oo 

It follows from Theorem |21 and the assumption that P{H) > that the last term above 
is positive. As this leads to an evident contradiction, (|19|) cannot hold, and the proof is 
complete. 

5.4 Some Refinements 

The proof of Theorem |1] shows that the values of Xi, X2, . . . are not estimable, even if one 
is given access to the entire sequence . . . , y-i, Yq,Yi, . . . generated by X and the noise. In 
particular, there is no function -0 : Af — > M'^ such that 



E 



n 

-V|V'(...,y,_i,y„y,+i,...) - x, 



i=l 







as n tends to infinity. 
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Another question that arises is how Theorem 0] bears on the problem of denoising a 
trajectory arising from a fixed (non-random) initial vector a; € A. It follows immediately 
that if Xi = F^x and yi = Xi + e,, then there is no denoising procedure such that 



E 



1 " 

-y'|V'n,j(2/l,---,; 



i=l 







for /i-almost every initial state x G A. For denoising procedures satisfying a natural fading- 
memory property, this conclusion may be strengthened. Let us say that a procedure {ipn,i} 
has fading memory if, with Yi defined as in ()18() . for each A; > 1, 

1 "^ 

— - ^ \'ll;n-k,i-k{yk+l,---,Yn) - 'lpn,i{Yl,...,Yn) 
i=k+l 

Averaging methods such as Smoothing Algorithm D posess the fading memory property. 
Under the conditions of Theorem |1J it can be shown that if {ipn.i} has fading memory, then 



lim E 

n— »oo 







lim sup E 



1 "" 

-Y'lV'n.ilyi,---,; 



i=l 



> 



for /i-almost every initial state x E A. Thus successful denoising is not possible starting 
from almost any initial state. 



6 Proof of Theorems [T] and [2] 

6.1 McDiarmid's Inequality 

McDiarmid's inequality is a special case of what is known as the concentration of measure 
phenomena. The basic idea is the following. If f{xi, . . . ,Xn) be a function that does not 
depend too strongly on the value of any single argument, and if Xi, . . . , Xn are independent 
random variables, then f{Xi, . . . , Xn) will be close to Ef{Xi, . . . , Xn) with high probability. 
Put another way, the distribution of /(X") will be concentrated around its mean. For a 
proof and discussion of the following result, see PHIITE] . 

Theorem B (McDiarmid) Let Xi, . . . ,Xn be independent random variables taking val- 
ues in a set A ^M. and let f : A" — > R. For i = 1, . . . ,n define 



Vi = sup I/(x^) - /(x* Sx^,,x[Yi)|, 
where the supremum is over all numbers xi, . . . ,Xn,x[ £ A. Then for every t > 



P{\f{X^)-Ef{X^)\>t} < 2exp 



EILi-f 



(211 



(22) 
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6.2 Analysis of Stochastic Variability 

Here we derive exponential inequalities for the final term in @ , which governs the stochastic 
variability of the estimate xi^n- Define Un{l, k) = J2jeA„(i k) ^j- 

Lemma 2 If H{A/5) < k < n/2 and k < I < n — k then 

n—k 

Un{l,k) = ^e,/{|xi - xj\ < A/5} H I{\yi+s - 2/j+.l < 3A/5} 

j=k l<|s|<fc 

Proof: Note that Unil, k) can be written in the form 

n—k 

Un{i,k) = J^^i n ^^\y^+- - yi+^\ ^ 3^/^> 

j=k \s\<k 

Fix j and define the quantities 

Wo = n m?/z+.-yi+s|<3A/5} 

ls|<fc 

and 

Wi = I{\xi -xj\< A/5} W I{\yi+s - yj+s\ < 3A/5}. 

l<|s|<fc 

It suffices to show that Wq = W\. If \xi — Xj\ < A/5 then |y/ — yj\ < 3A/5 and the desired 
equality is immediate. Suppose then that |x/ — Xj\ > A/5, in which case W\ = 0. If in 
addition Wq = 1, then \xi-^s — Xj+s\ < A for \s\ < k, which implies that Ix^ — Xj| < H^^{k) < 
A/5. As this is a contradiction, Wq must be zero, and the lemma is established. 

Lemma 3 Let L = A/5 be an upper bound on \ei\. Fix n > 1 and integers l,k satisfying 
the conditions of Leninia\^ Then for every t > 0, 

( —2t^ At 

P{\Un{lM>t} < 2exp<^— 2-— — -^ + 



nL2(2A; + l)2 nL{2k + 1) j ' 
and in particular 



F{|(/„(U)I > *} < 2exp{^;^^^g!^ 



fort>2L{2k + l). 



Proof: Define U by excluding indices j = I — k, . . . ,1 + k from the sum defining Un{l, k), 
more precisely 

(l-k-l n-k \ 

j=k j=l+k+l) l<|s|<fc 
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with the understanding that the first sum is zero if / < 2k, and the second sum is zero if 
I > n — 2k. Then \Un{l, k) — U\ < {2k + 1)L, and as Ej is independent of the other products 
in the j'th summand, EU = 0. Suppose for the moment that the values of £i-k, ■ ■ ■ ,£i+k 
have been fixed. In this case yi-k, ■ ■ ■ , Hi+k are fixed, and [/ is a function of n — {2k + 1) 
independent random variables = {ej : j = 1, . . . ,1 — k — 1,1 + k + 1, . . . , n}. Let / be 
such that U = /(O)- Changing any Ej G will change yj, and can affect at most 2k + 1 
terms in the sum defining U; thus the coefficient Vj defined in 1)21(1 is at most {2k + 1)L. As 
E{U I e|+^) = EU = 0, McDiarmid's inequahty imphes that 

P{\U\>t\E\tl) < 2exp ' 



(E;=t-'+Epw)((2fe+i)i)\ 

< 2exp 



-2*2 
n(2A; + l)2L2 

Taking expectations, the same inequality holds for P{|f/| > t}. The first of the stated 
inequalities follows from the fact that \Un{l,k) — U\ < {2k + 1)L, and the second follows 
from the first by a straightforward calculation. 

Definition: Let Vn{l,k) = \An{l,k)\^^J2ji^An(ik)^J ^^ ^^^ stochastic term appearing in 
©. For each m > 1 and 1 < A; < n/2 define 

Ln{m, k) = {I : \An{l, k)\ > m and k < I < n — k} 

to be the set of indices / for which at least m length-/c matches are found. 

As an immediate corollary of Lemma |21 we may derive bounds on the probability that 
one of the terms V^(/, k) with \An{j, k)\ > m exceeds a given constant 5 > 0. In particular, 
treating a maximum over the empty set as zero, we find that 

p\ max \Vn{l,k)\>6] = p\ max '^"^//fj. >^ 

[ZeL„(m,fc) J [l€L„{m,k) \An{l,k)\ 

< P< max \Un{l,k)\ > 6m 

\^l£Ln{m,k) 

< n ■ uiaxP{\Un{l,k)\ > 6m} 

( -25'^m^ A6m , , , 

< 2n exp <^ — — + — \ 23) 

^\nL^{2k + \f nL{2k + l)' ^ ' 



/ -^ 



^m^ 



Inequality (|24() is used below, in conjunction with the Borel Cantelli Lemma, to establish 
the almost sure consistency of the estimates xi^n and xi^n- Neither result makes full use of 
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the inequality, which shows, for example, that for each a G (0, 1/2), 

n°. max /f^;}^fl -^ (25) 

l£L„(m,k) \An{l,k)\ 

with probability one, provided that k = O(logn) and m > n^ with /3 € (a + 1/2, 1). The 
next lemma appears in f|24j): we include the proof for completeness. 

Lemma A If k = o(log n) then for every e > 0, 

1 " 

- V/{U„(j,A;)| <n^'^} -^Q asn^oo. 

n ^^-^ 

j=0 

Proof: As A is compact, there exists a finite set set S C A, such that 

, A 
max mm \u — v\ < — . 

«GA v€S 10 

Let S"^^^^ be the collection of sequences s = (s_fc,- • • ,Sk) with Si G S. For each x G A 
there is some s G S^'^"'"^, such that max|j|<fc \si — F^x\ < A/10. Thus if we define 

JJs) = \j ■.0<j <n and max Is,- - F'+^xl < — 1 s G S^''^^ 

then each integer j = /c, . . . ,n — /c is in contained in at least one set Jnis). Moreover, if 
ji,J2 G Jn{s) then 

max \xj^+i - Xj2+i\ < — and max ly^.+j - yj^+i] < -—, 

\i\<k O \i\<k 

and therefore ji G An{J2,k) and J2 G An{ji,k). It follows from this last observation that 
\AniJ,k)\ < N and j G Jnis) imply |J„(s)| < A^- Fix < e < 1. As k = o(logn), 
|^2A:+ij _ i^pfc+i _ o(j^f/2^_ Lg^ ^^ denote the sum over 5'^^+-'^. When n sufficiently large, 

n n 

Y,I{\An{j,k)\<n^-^] < j;^/{|^„(j,fc)I<ni-^}/{jGJ„U)} 
i=0 i=0 s 

n 

£ i=o 



< E i'^"U)i^{i'^n(2^)i < ^'''^' • i^r'""'} 



1 — — 

< n 2 . 



As the last term above is o{n) the result follows. 
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Proof of Theorem^ Fix (3 G (1/2,1). The stochastic term in inequahty ((T)) can be 
bounded as follows: 

n-k I v^ I I Y^ I . n-k 

n-2fc2^ |^„(/,A;)| -^eL^K.fc) K(/,A;)| +5(n-2A;)Z^^^'^"^''^^'-"^ 

Inequality 1)241) ensures that the first term on the right hand side tends to zero with proba- 
bility one. The second term tends to zero by Lemma 0] 

Proposition 3 Let the window widths ki be defined as in |^. // x is strongly recurrent 
then nim{kj^n '■ logn < j < n — logn} ^ oo as n ^ oo. 

Proof: Let O be a given finite cover of A by sets having diameter less than A/5. Fix K > 1 
and define 7 to be the set of all Cartesian products 0_fc x • • • x 0^ with 1 < k < K and 
such that each Oi G O. Let C^ denote any product of 2/c + 1 sets from O. As x is assumed 
to be strongly recurrent, 7 = 70 U 71 where 



70 = 

fc=i 



(j\c,: Y^Hxl^ G C,} < 00 71 = U C, : lim mf - J] /{x^^^ G C,} > 

Ic=l L i=k ) fc=l I i=k ) 



As 7o is finite, there exists an integer N < 00 such that x'^-j^ G C^ G 71 for every k < K 
and every j > logN. Moreover, if x*^^ and x-^-j^ lie in the same set Ck G 71 and logn < 
i,j < n — logn then it is clear that i G An{j, k). Thus when n > N, 



i—k 
CfeG7l 



l^n(j) ^)| > ^in y^ I{xl^i^ G Ck} for k < K and j = logn, . . . , n — logn. 



The definition of 71 ensures that \An{j,k)\ > n/logn for n sufficiently large and k,j as 
above. Therefore liminf„ miuj kj^n ^ K and the result follows as K was arbitrary. 

Proof of Theorem 121 Let k = logn and m = n/logn. It follows from inequality © and 
the definition of /c^ „ that 

max |x/-2;i^„| < max H [ki^n) + max . , — -j 

K,<l<n—K K<l<n—K K<l<n—K \ AjiyL, ki ji)\ 

, rj-l f ■ , \ , I YljeA„(l,k) ^j I 

< n I mm kin ] + max max 



yK<l<n-K ' y l<k<K leL„{m,k) \An{l,k)\ 

If X is strongly recurrent then the first term on the right hand side tends to zero by an 
application of Proposition Q and Proposition |31 Inequality ()24j) and a standard Borel- 
Cantelli argument show that the second term tends to zero with probability one. 
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7 Proof of Theorem [3l 

Throughout this section {x, x') is a fixed strongly homochnic pair for F. Define Xi = F*x, 
x[ = F^x' , Hi = Xi + Ei and y[ = x[ + Si as above. As (x, x') is strongly homoclinic, 

y^\xi — x'^\ < oo. (26) 

Lemma 5 // conditions \14i l and il5]) hold, then there exist sets A* C M'^, i G Z, such that 

a. A* C (^S + Xi) n (5 + x'j) for each i, and 

b. P{yi G A* and y[ G A* for all i £ Z} > 0. 

Proof: For each i G Z define Ai = {S + Xi) n {S + x'^). Note that 

P{yi^A, or y^0^.} < P{y, ^ Ai} + P{y'i ^ Ai} 

= P{ei. (5 + (x^ - Xi))} + P{ei (5 + (xi - xO)} 
= r^{S\{S + {x[ - Xi))) +7^{S\{S + {x, - x'i))) 

< 2 7?((a5)i^>-^^i) 

Assumption (flU) implies that r]{ {dSy^^^^^' ) < c|xi — x^| for some constant c < oo, and it 
then follows from ()26(1 that 

^P{yi ^Ai or y[ ^Ai} < oo. 

By an application of the Borel Cantelli Lemma, there exists an integer A*" such that 

P{yi G Ai and y- G Ai for all \i\ > N} > 1/2. (27) 

Define A* = Ai for \i\ > N. Clearly (a) holds for each |i| > A^. 

It remains to select sets A* for \i\ < N. To this end, let v* G R'^ be any vector such that 
for some 6 > 

sup \v — v*\ < p — 5 

■uGA 

and define A* = B{v*, {p + 5)/2) for |i| < A^. Then for each u G A, 

sup \u\ < + \v — v\ < - p, 

u€{A*-v) ^ 2 

which implies that {A* — v) CI B{0,3p/2) C S. Thus (a) holds for \i\ < N. Moreover, for 
each such i, 

P{yi e A* and yl e A*} = P{ei e {A* - Xi) n {A* - x'^} 

= r^{{A*-Xi)n{A*-x'^). 
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The inequality | |t;* — Xj| — |?;* — x^| | < |xj — x^| < p implies that [A* — Ui) n {A* — Vi) has 
positive Lebesgue measure. As the intersection is also contained in S, the last probability 
above is greater than zero. Conclusion (b) of the lemma follows from this observation and 
H27|). as the e^'s are independent. 

Let Q and Q' be probability measures on {X,S) equal to the respective probability 
distributions of the random elements y and y'. Using the sets A* from Lemma El define 
the Cartesian product 

r = JjA* e 5. (28) 

It follows from part (b) of Lemma El that Q{r),Q'(T) > 0. 

Lemma 6 The measures Q and Q' are mutually absolutely continuous on T: for each 
B £ S contained in T, Q{B) = if and only if Q'{B) = 0. 

Proof: Let 5„ C 5 denote the sigma field generated by the coordinate functions '7rj(x) = Xj, 
with \i\ < n. Let Qn and Q'^ be the restrictions of Q and Q' to 5„, respectively. Then 
clearly 

n n 

dQni'v) = Y\_ f{vi-Xi)dv-n---dvn and dQ^iy) = JJ f{vi-x'i)dv-n---dvn. 

i=—n i=—n 

Furthermore, Lemma El ensures that Qn and Q'^ are mutually absolutely continuous on T, 
with derivative 

For each n > 1 let r„ = {v : Uj € A* for |z| < n}, and define the ^^-measurable function 

Suppose that B G Sn- Then clearly B n Fn+i £ <Sn+i and B flTn € 5„, and therefore 
[ Rn+idQ' = I ^9jl±ldQ' = Qn+i{BnTn+i) 

= Q{B nVn+i) < QiBnTn) = f RndQ'. 

JB 
Thus {Rn,Sn) is a non-negative super-martingale. By the martingale convergence theorem, 
Rn converges with Q'-probability one to a non-negative random variable R* . 

We now wish to establish the following relation, which will imply that Q' « Q on F 
(see the argument below): 

Q'{veF:i?*(v) =0} = 0. (29) 
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By condition (jl3|l there exists numbers ^o > and c < oo such that 

f{w + z) 



L 



Sn{S-z) 



log. 



/M 



f{w)dw < c\z\ 



(30) 



whenever \z\ < 60. By (|26|) there is an integer m such that jui — fi| < 60 for \i\ > m. As 
Rm{'^) > for each v E F, the equahty (|29() wih fohow from 



log 75^ 



dQ' < 00. 



(31) 



To estabhsh ()31() . note that by Fatou's lemma 



, R* 
log 7^ 



lim inf 



log^^ 



< lim inf 



r„ 



1 -Rn 

log-^ 



dQ' < lim inf / 

n— >oo Jp 

dQ' 



log^^ 



lim inf 

n— >oo 



r„ 



1 Rn 

log -7^ 



dQ' 



Moreover, for each n > m, 

Rr, 



log 



Rn 



dQ'n 



m<\i\<n 



< 



E 



A* 



log 



fivi- 


-Xi) 


f{vi- 

f{v^ 


-0 


f{v. 


-0 



n f^'"i ~ 4) ^^- 



/(uj - x[) dvi 



■ dv,! 



m<|j|<?i ' 

By an elementary change of variables, our choice of m and the inequality (|3U|) imply that 

f{vi - Xi) 



A* 



log 



f{v^-X^ 



f{vi -x[)dvi < c\xi-Xi\ 



Combining the results of the last three displays, it follows that 

00 
dQ' < E 



, R* 
log-^ 



•Xj 1 iL/ n 



The sum is finite by (|26|) . which establishes (|31|) and the relation (|29jl . 

Suppose now that i? G 5 is such that S C F and Q'{B) > 0. For n > 1 define events 
i?„ = {v : 3v' € i? s.t. Vi = v[ for |i| <n}^B. By another application of Fatou's Lemma, 

Q{B) = lim Q(i?„) = liminfQ„(i?„) = lim inf [ ^IbJQ' 

> [limini ^IbJQ' > f R*dQ'. 
J "^°° dQ'^ Jb 

The last inequality above follows from the definition of R* and the fact that Bn 5 B. 

As Q'{B) > 0, the relation H29() implies that the last integral and Q{B) are positive. Thus 

Q' <C Q on F. An identical argument, exchanging the roles of Q and Q', shows that Q <^ Q' 

on F as well. 
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Lemma 7 // (x, x') is homodinic and conditions Iil ^) -lil5 \) hold, then for every measurable 
function (j) : X —^ M!^ , 

'dQ' 



E\ \<i>{y) — X I + \(t>{y') — x' \] > |a; — x'l / mm 



dQ 



dQ > 



(32) 



where T (^ X is defined as in I12(^} . 



Proof: Lemma IHl shows that Q' <^ Q on T. Let {dQ' /dQ){v) be the associated derivative, 
which is defined for each v € L. The expectation above can be written equivalently as 

f \(f)-x\dQ + f \(t>-x'\dQ' > f \4>-x\dQ + f \(t>-x'\dQ' 

dQ 



^f + f^-^'ilg] 


/^mm 


\dQ' 1 
[dQ' \ 


dQ 



> \x — x'\ 

As {dQ' /dQ)(y) is positive for Q-almost every v S F, and Q{T) > 0, the last integral is 
positive. 

The lower bound in Lemma [7| bears further discussion. Suppose for the moment that 
the distribution r] of the noise satisfies ()13p and is supported on all of M. , which is the case 
if the Ei are Gaussian. Then we may take A* = R for each i, so that T = X- In this case, 
further evaluation leads to a simplification of the integral in 1)32(1 : 



mm 



dQ^ 
dQ' 



dQ 



<- 



^^■^%<^ 



= 1 - \\Q-Q'\\- 

Here \\Q — Q'\\ = sup^g^ \Q{B) — Q'{B)\ is the total variation distance between Q and 
Q'. As Q and Q' are mutually absolutely continuous, \\Q — Q'\\ < 1 and we see again that 
the lower bound in Lemma [7| is positive. When T ^ X one may derive a similar, but more 
complicated, expression for the integral in (|32|) . 

Although no scheme can reliably distinguish between the elements of a strongly homo- 
clinic pair (x, x') from noisy observations of their trajectories, we may say that a scheme (/> 
is optimal for this pair if it achieves the lower bound above. One may readily check that 
the maximum likelihood scheme 



(v) 



if S(v) < 1 
otherwise 



is optimal in this sense. 
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